Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Variant Discovery ◾ 159

The first column includes the line number in the original input file. The second column

shows the functional consequences of the variant. The possible consequences are nonsyn-

onymous SNV, synonymous SNV, frameshift insertion, frameshift deletion, nonframeshift

insertion, nonframeshift deletion, frameshift block substitution, or nonframeshift block

substitution. The third column includes the gene name, the transcript identifier, and the

sequence change in the corresponding transcript.

The “annotate_variation.pl” tool has numerous arguments. Use “annotate_variation.pl

-h” to display the complete list of arguments.

ANNOVAR provides “table_annovar.pl” script as an easy way to annotate variants in

a VCF file as an input. No need to convert VCF file into ANNOVAR input file. It takes a

VCF file as an input and generates a tab-delimited output file with many columns, each

represents one set of annotations. It also generates a new output VCF file with the INFO

field filled with annotation information.

./table_annovar.pl ../input/humanSNP.vcf humandb/ \

-buildver hg19 \

-out ../output/humanSNP2 \

-remove \

-protocol refGene,cytoBand,exac03,avsnp147,dbnsfp30a \

-operation g,r,f,f,f \

-nastring . \

-vcfinpu

The “-remove” option removes all temporary files. The “-protocol” option is comma-delim-

ited string that specifies an annotation protocol. These strings typically represent data-

base names in ANNOVAR. The “-operation” option tells ANNOVAR which operations

to use for each of the protocols, where “g” means gene-based, “gx” means gene-based

with cross-reference annotation (from -xref argument), “r” means region-based, and

“f” means filter-based. The above ANNOVAR command generated three output files:

“humanSNP2.avinput”, “humanSNP2.hg19_multianno.txt”, and “humanSNP2.hg19_

multianno.vcf”. The first one is an ANNOVAR input file. The second one is the annota-

tion file with annotation columns, and the last one is a VCF file with annotation added

to INFO fields. Open each of these files and study their contents.

We can also try the annotation database that we created for SARS-CoV-2. We can anno-

tate “sarscov2.vcf”, which was generated from a previous variant calling example. You can

copy it to the “input” directory for easy use. Thus, we can annotate it using the following

script:

./table_annovar.pl ../input/sarscov2.vcf sarscov2db/ \

-buildver SARSCOV2 \

-out ../output/cov2SNP \

-remove \

-protocol refGene \